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ABSTRACT 

Erroneous repair of DNA double-strand breaks by 
homologous recombination (HR) leads to loss of 
heterozygosity (LOH). Analysing 22392 and 74415 
LOH events in 363 glioblastoma and 513 ovarian 
cancer samples, respectively, and using three differ- 
ent metrics, we report that LOH selectively occurs in 
early replicating regions; this pattern differs from 
the trends for point mutations and somatic dele- 
tions, which are biased toward late replicating 
regions. Our results are independent of BRCA1 
and BRCA2 mutation status. The LOH events are 
significantly clustered near RNA polll-bound tran- 
scription start sites, consistent with the reports 
that slow replication near paused RNA polll might 
initiate HR-mediated repair. The frequency of LOH 
events is higher in the chromosomes with shorter 
inter-homolog distance inside the nucleus. We 
propose that during early replication, HR-mediated 
rescue of replication near paused RNA polll using 
homologous chromosomes as template leads to 
LOH. The difference in the preference for replication 
timing between different classes of genomic altera- 
tions in cancer genomes also provokes a testable 
hypothesis that replicating cells show changing 
preference between various DNA repair pathways, 
which have different levels of efficiency and 
fidelity, as the replication progresses. 



break repair pathway (1). HR is active during and shortly 
after DNA replication — when sister chromatids and hom- 
ologous chromosomes are easily available (2). DNA rep- 
lication is spatially segregated such that some genomic 
regions are replicated early and others later during S 
phase (3). It was recently demonstrated that local DNA 
replication timing (RT) affects the patterns of point mu- 
tations (4-6), somatic copy number alterations (4,7,8) and 
rearrangements (9) in cancer and normal genomes — late 
replicating regions accumulate more mutations than early 
replicating regions (10). These findings prompt the 
question of whether LOH events, which are primarily rep- 
lication-dependent phenomena, also show distinct 
patterns in the context of DNA RT. 

Here, integrating genomic alteration data for 597 glio- 
blastoma (GBM) (11) and 591 ovarian cystadenoma (12) 
samples from the cancer genome atlas (TCGA), and DNA 
RT data for multiple cell types (3), we survey the RT 
pattern of the genomic regions affected by LOH events, 
and discuss the findings in the context of the temporal 
expression pattern of the genes involved in the HR- and 
non-homologous end-joining (NHEJ)-mediated repair. 
We then compare and contrast the RT preference for 
LOH events with that for point mutations and somatic 
copy number alterations in cancer genomes. We further 
analyse the findings in the context of factors that are 
known to contribute to replication stress during early rep- 
lication, and also the nuclear localization of homologous 
chromosome pairs. Finally, we conclude by discussing our 
findings in light of erroneous HR-mediated repair during 
early replication. 



INTRODUCTION 

Loss of heterozygosity (LOH) is a common class of 
genomic alterations observed in cancer genomes, which 
occurs due to heterozygous deletion of one allele, or du- 
plication of a maternal or paternal chromosome or 
chromosomal region and concurrent loss of the other 
allele; the latter is known as copy neutral LOH or 
uniparental disomy. Copy neutral LOH events arise via 
homologous recombination (HR) — a DNA double-strand 



MATERIALS AND METHODS 

We mapped all data sets to human reference genome 
version hgl8. Various genomic and epigenomic features 
were downloaded from the UCSC genome browser (13) 
as appropriate. 

DNA RT data set 

We obtained RT data measured using a massively parallel 
sequencing-based technique across multiple human cell 
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types from Hansen et al. (3). In this study, the RT of 
different genomic regions was categorized as 'constant 
early 1 , 'constant mid', 'constant late' and 'variable across 
cell types'. Some regions had no RT assigned because of 
coverage, mappability and other technical issues. We 
focused on genomic regions that had constant early and 
constant late RT across several human cell types through- 
out this article. Constant early and constant late RT 
regions covered 585.13 and 521.14Mb of the genome, 
respectively. The remaining regions are termed as 
'other RT' regions. 

LOH and other genomic alterations data sets 

We have obtained genomic data for 597 GBM (11) and 
591 ovarian cystadenoma samples (12) from TCGA. LOH 
status for the GBM and ovarian cancer samples was 
analysed using Illumina HumanHapMap550K and 
Human lMDuo microarrays, respectively, and processed 
by the Hudson Alpha Institute for Biotechnology using 
published protocols (11,12). The somatic copy number al- 
teration data for the same samples were obtained from 
TCGA (11,12). We excluded the samples with potential 
systematic biases, and also the LOH events that were 
likely to occur via heterozygous deletion (Supplementary 
Module SMI), using our previously published approach 
(14). Our final data set had 22 392 and 74415 LOH events 
in 363 GBM and 513 ovarian cancer samples, respectively. 

Analytical approach and estimation of statistical 
significance 

We used Bed tools (15) for calculating overlap between 
two genomic features (e.g. LOH and early replicating 
regions; getOverlap function) and for estimating intersec- 
tion between multiple features (multilntersectBed 
function). Some genomic regions did not have any RT 
assigned because of mappability, coverage and other tech- 
nical issues. Hence, often some LOH end points did not 
have any RT assigned, but the genomic regions in their 
proximity did. To maximize biologically relevant overlap 
between the data sets, we considered a window of 1 kb 
centering each LOH end point and assigned the RT of 
that window as the RT of these end points. 

We calculated (i) the observed (or expected) proportion 
of LOH end points in early RT regions as: 

number of LOH end points in RT regions 
{number of LOH end points in early RT regions 
+ number of LOH end points in late RT regions) 

(ii) the observed (or expected) proportion of LOH 
events with both end points in early RT regions as: 

the number of LOH 

events with both end points in early RT 
( the number of LOH with both end points in early RT^ 

+ both end points in late RT+ one end point in early 
y RT and the other in late RT regions y 



and (iii) the observed (or expected) proportion of the 
length of LOH events in early RT regions as: 

number of base pairs of 

LOH overlapping with early RT regions 
( number of base pairs of LOH overlapping with early \ 

RT regions + number of base pairs of 
\LOH overlapping with early RT regions y 

We found excluding the LOH end points and stretches of 
genomic regions affected by LOH events that reside in 
other_RT regions provides a more meaningful interpret- 
ation of the observed preference for early (or late) RT 
regions and its statistical significance, compared with the 
cases where other_RT regions were included in the 
analysis. 

We estimated statistical significance of the observed 
overlaps between LOH and RT patterns using permuta- 
tion analysis with 10000 iterations. It was shown that per- 
mutation allows preservation of higher-order genomic 
structures, and hence provides a more realistic P-value 
compared with other statistical tests. During the permuta- 
tion analysis, we performed genome-wide shuffling using 
the shuffleBed function of the Bedtools (15) with default 
seed and other parameters, and also keeping the length of 
the LOH events unchanged. We also used two alternative 
permutation strategies: shuffleBed with the -chrom 
option to permute the LOH events within respective 
chromosomes, and shuffleBed with the -chrom and 
-excl options to permute the LOH events within respect- 
ive chromosomes, after excluding selected (e.g. 
centrometric) regions. 

Cell cycle-related gene expression 

We obtained data on dynamic expression patterns of the 
genes during the cell cycle from multiple independent ex- 
periments in baker's yeast (16-19) and human cell lines 
(20) as deposited in Cyclebase 2.0 (21). Peak time, period- 
icity and regulation of these genes were calculated using 
methods proposed by Gauthier et al. (22), and archived in 
the database. In brief, the P(per) was defined (22) as the 
chance of observing as great a periodicity by random 
shuffling of the individual time-point values of the expres- 
sion profile. First, a Fourier score was obtained for each 
gene profile. Next, simulated profiles were generated from 
random shuffling of the data within the original profile 1 
million times. The relative proportion of simulated profiles 
whose Fourier scores were greater than or equal to the 
gene's true Fourier score was reported as the P(per). 
Due to the normalization techniques used by Gauthier 
et al. (22), P(per) can take values >1. A small P(per) 
indicated a highly periodic pattern of expression. If the 
expression data for a given gene were available from 
multiple experiments, the P(per) from individual experi- 
ments were multiplied to generate the final P(per). 



Fi= J (2^ sin ( to ')-A-,(r))"+(^cos( ffl 0-^(0) 
where o)=2jr/(interdivision time) 
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The P(reg) was defined (22) as an estimate that the mag- 
nitude of variance between experiments. First, for a given 
gene the standard deviation was obtained for the log-ratio 
profile. Then simulated profiles were created from the 
global distribution for 1 million iterations. The proportion 
of shuffled profiles whose standard deviations were greater 
than or equal to the gene's standard deviation was 
calculated, and normalized to create the final P(reg). 
Due tothe normalization techniques used by Gauthier 
et al. (22), P(reg) can take values >1. A small P- value 
for regulation indicated low variance and a strongly 
regulated gene. 

Peak time was calculated as a percentage, with both 0 
and 100 representing the M/Gl transition phase during 
the cell cycle. To compute a peak time for a single gene 
across all available experiments, a sine wave was fitted to 
the combined expression profile, and the time scale was 
'shifted' such that time was represented as a fraction of the 
cell cycle. In those cases where the expression pattern 
lacked periodicity at the cell cycle time scale, or the ex- 
pression pattern between experiments was inconsistent, 
the peak time was reported as 'Uncertain'. 

Genomic and epigenomic features associated with 
replication stress 

We analysed 76 common fragile sites (23), early replicating 
fragile sites (24), human genes obtained from Ensembl v54 
(25), transcription start sites as in Ensembl v 54 (25), and 
the sites of RNA polll occupancy in GM 12878, HUVEC, 
HeLa and K562 cell lines (13,26,27). Because transcription 
start sites are a single base-pair wide, we considered a 
window of ± 5 kb while testing for overlap in both 
observed and expected cases. The regions marked as 
'standard peaks' (StdPk.NarrowPk track from the 
ENCODE/Stanford/Yale/USC/Harvard group) were 
chosen as the sites of RNA polll occupancy in the four 
ENCODE cell lines (26,27). 

Distance between homologous chromosomes 

We obtained the data on the distance between homolo- 
gous chromosomes in the EJ-30 human epithelial cancer 
cell line from Heride et al. (28). In brief, the authors used 
fluorescence in situ hybridization using advanced micros- 
copy and image analysis tools to analyse in 3D the radial 
positions of 10 chromosomes (chrl, chr4, chr8 chrlO, 
chrl4, chrl6, chrl7, chrl8, chrl9 and chr21). Most of 
the chromosomes occupied specific nuclear positions in 
the genome and had small variance in inter-homolog 
distance (28). The nuclear localization and inter-homolo- 
gous distance estimated in this study were comparable 
with that estimated in other human cell types (28,29). 

RESULTS 

Data sets analysed 

We used RT data measured using a massively parallel 
sequencing-based technique across multiple human cell 
types (3). Some genomic regions replicated early (or late) 
irrespective of cell types (noted as constant early or 



constant late RT regions, respectively), whereas others 
had variable patterns. Throughout this article we 
focused on the genomic regions that were classified as 
constant early RT (total length 585.13 Mb) and constant 
late RT (total length 521.14 Mb). 

We obtained the LOH data as available for 597 GBM 
(1 1) and 591 ovarian cancer (12) samples from TCGA. We 
performed extensive quality control steps, excluding the 
samples with potential systematic biases (e.g. batch 
effects, low signal to noise ratio), and also the LOH 
events that were likely to occur via heterozygous 
deletion (see Methods and Supplementary Module 
SMI). Our final data set had 22 392 and 74415 copy 
neutral LOH events in 363 GBM and 513 ovarian 
cancer samples, respectively. 

Genomic regions affected by LOH events are replicated 
predominantly early 

HR-mediated repair can initiate near one end point of 
LOH events and proceed unidirectionally to the other 
end point, or start somewhere between and proceed 
bidirectionally up to the two end points of the LOH 
events. To investigate DNA RT patterns of the LOH 
events after considering these possibilities, we adopted 
three metrics, analysing DNA RT patterns — (i) at the 
LOH end points, (ii) over the length of the LOH events 
and (iii) focusing on only the small (<10kb) LOH events, 
which are likely to have the same RT throughout the 
length. 

First metric 

To study DNA RT patterns at the LOH end points, we 
overlaid RT data and the LOH end points from TCGA 
ovarian cancer samples (12) on the human reference 
genome (Figure 1A), and found that 40 189 and 21621 
LOH end points occurred in constant early and constant 
late RT regions, respectively. There were, on average, 
0.134 LOH end points per megabase (Mb) per sample in 
the constant early RT regions, and 0.081 LOH end points 
per Mb per sample in constant late RT regions in the 
filtered ovarian cancer data set. We compared the propor- 
tion of LOH end points in early (or late) RT regions with 
that expected by chance using permutation analysis (see 
Methods for details), and found that the observed prefer- 
ence for LOH end points to occur in the early RT regions 
was significantly higher compared with that expected by 
chance (permutation test; P-value <1 x 10~ 3 ; Figure IB). 

We then repeated the analyses for TCGA GBM samples 
(11) and found that there were, on average, 0.055 LOH 
end points per Mb per sample in the constant early RT 
regions and 0.040 LOH end points per Mb per sample in 
constant late RT regions in the filtered data set. Once 
again, a permutation analysis revealed that in GBM 
samples, LOH end points also preferentially occurred in 
early RT regions (permutation test; P-value <1 x 10~ 3 ; 
Figure 1C). 

To examine whether the aggregated patterns are biased 
by a small number of outlier samples, we repeated the 
analyses for individual samples. Although small number 
of LOH events in individual samples made the trends 
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Figure 1. (A) A schematic representation showing patterns of DNA RT at the LOH end points. Comparisons between the observed proportions of 
the (B) ovarian cancer and (C) GBM LOH end points in early RT regions (dashed vertical bar) with that expected when the LOHs are shuffled 
across the genome. (D) Proportion of the individual ovarian cancer and GBM samples, where observed proportion of LOH end points in early RT 
regions is higher than that in the shuffled distribution. (E) A schematic representation showing overlap between genomic regions affected by LOH 
events and early RT regions. Comparisons between the observed proportions of the length of the (F) ovarian cancer and (G) GBM LOH events 
covered by early RT regions (dashed vertical bar) with that expected when the LOHs are shuffled throughout the genome. (H) Proportion of the 
individual ovarian cancer and GBM samples, where observed proportion of the length of LOH events in early RT regions is higher than that in the 
shuffled distribution. We also obtained consistent results using alternative permutation approaches, as described in the Supplementary Module SM3. 
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noisier, we found similar patterns for a majority of the 
GBM and ovarian cancer samples (Figure ID) — highlight- 
ing that our aggregated results were not due to certain 
outlier samples. 

Next, we calculated how often both the end points of 
LOH events in TCGA ovarian cancer (12) and GBM (11) 
samples resided in similar (i.e. both end points in constant 
early or constant late) or different (i.e. one end point in 
constant early and the other end point in constant late) 
RT regions (Methods). We found that the observed pro- 
portion of LOH events with early RT at both end points 
was significantly higher compared with that expected by 
chance (permutation test; P-value <1 x 10~ 3 ) for both the 
ovarian cancer and GBM data set, and that our 
aggregated results were not biased by outlier samples 
(Supplementary Module SM2). Taken together, our 
findings suggest that LOH end points preferentially 
occurred in early RT regions. 

Second metric 

To study RT patterns over the length of the LOH events, 
we calculated the proportion of the length of the genomic 
region affected by LOH events that replicated early and 
those that replicated late during the S phase (see Methods, 
Figure IE). We found that the proportion of genomic 
regions affected by LOH events replicated predominantly 
early was higher compared with those replicated late, and 
the trend was statistically significant compared with that 
expected by chance (permutation test; P-value <1 x 10~ 2 , 
Figure 1F-G), and that our aggregated results were not 
due to certain outlier samples (Figure 1H). 

Third metric 

We then focused on the small (<10kb) LOH events, the 
majority of which are likely to have same RT across their 
length. For both ovarian cancer and GBM samples, using 
analytical approaches similar to that described previously, 
we found that the small LOH events were also signifi- 
cantly likely to have early RT at their end points and 
also over their length, compared with that expected by 
chance (permutation test; P- value <1 x 10~ 3 ). 

Finally, we carried out extensive control calculations to 
account for potential caveats. We performed additional 
permutation analysis by: (i) randomizing the LOH 
events only within the same chromosomes, (ii) after ex- 
cluding centromere regions and (iii) grouping the LOH 
events as those <1, 1-5 and >5Mb in size, and in each 
case found consistent results for both the cancer data sets 
(Supplementary Module SM3). We found similar results 
irrespective of the germ line and somatic mutation status 
at the BRCA1 and BRCA2 loci (Supplementary Module 
SM3). DNA RT is correlated with many genomic and 
epigenomic features. Integrating chromatin (26), cytogen- 
etic banding patterns (30) and GC content (13) data, we 
found that our results are consistent even after controlling 
for these potential covariates (Supplementary Module 
SM3). Integrating long-range interaction and repeat 
element data, we found that the two end points of LOH 
events frequently harbor similar repeat classes, and also 
are in proximity of each other in the 3D nucleus 
(Supplementary Module SM3); these attributes might 



facilitate co-operative HR-mediated repair within the 
same replication factory, but further studies are war- 
ranted. Taken together, our findings suggest that LOH 
events preferentially occur in early RT regions, and the 
results are similar across different cancer types, and 
robust toward the choice of data sets and statistical 
approaches. 

LOH end points have different RT preferences compared 
with other types of genomic alteration 

Different classes of genomic alterations, e.g. point muta- 
tions, somatic copy number alterations and LOH arise 
because of erroneous repair of DNA lesions by various 
DNA repair pathways. Recently, it was reported that 
local DNA RT also affects the patterns of point mutations 
(4-6) and copy number alterations (4,7,8) — point muta- 
tions are enriched in late replicating regions, and end 
points of somatic copy number alterations, especially de- 
letions, occur at a high frequency in late replicating 
regions (10). Here we reported that, in contrast, the 
LOH end points selectively occur in early replicating 
regions in multiple cancer types (Figure 2A). The differ- 
ence in RT patterns between these distinct classes of 
genomic alterations led us to ask whether the DNA 
repair pathways, especially the HR pathway that 
mediates LOH events (1), also show systematic changes 
in expression during different phases of the cell cycle. 

HR pathway genes are active during early replication 

We surveyed the temporal pattern of expression of the 
genes involved in the canonical DNA double-strand 
repair pathways during the cell cycle in yeast and 
humans. We obtained data on the dynamic expression 
pattern of the genes in the HR (i.e. RAD50, RAD51, 
RAD52, RAD54, BRCA2, XRCC2 XRCC3, NBN, 
MRE11, MUS81, GEN1, SHFM1, RBBP8) and NHEJ 
pathway (i.e. KU70, KU80, LIG4, HYRC, XRCC4) from 
multiple independent experiments (16-20) as deposited in 
the Cyclebase 2.0 (21). We found that mRNA expression 
of RAD51 and RAD54, which are important for initiation 
of HR-mediated repair, was high during early replication 
(Gl-S phase) and decreased rapidly afterward (S-G2 
phase); the pattern was consistent across independent ex- 
periments in both humans and yeast, and showed signifi- 
cant periodicity and low variance (Figure 2B-F; P(per) 
<1 x 10~ 5 ; see Methods for periodicity and variance cal- 
culation). Expression of other genes in the HR pathway, 
or those involved in the NHEJ pathway, did not show 
distinct cell cycle specific pattern (Supplementary 
Module SM4). Although we could not examine protein- 
level expression and post-transcriptional modifications on 
these genes, the observed findings are consistent with the 
model that HR-mediated repair is active even during early 
stages of DNA replication. This is in agreement with the 
report by Kadyk and Hartwell (1992) that HR-mediated 
DNA repair using homologous chromosomes leading to 
LOH can occur during Gl stage of the cell cycle (31). It 
prompted us to investigate whether certain types of repli- 
cation stress might trigger HR-mediated repair during 
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Figure 2. (A) Point mutations and copy number alteration (especially deletion) end points are prevalent in late RT regions, but LOH end points are 
more common in early replicating regions. Temporal patterns of expression of (B) RAD54B, (C) RAD54L, (D) RBBP8, (E) RAD54 and (F) RAD51 
during cell cycle in yeast and human cell lines, derived from multiple independent experiments. Measure of periodicity P(per), variance between 
experiments P(reg) and peak time of expression are listed for each gene, as obtained from CycleBase 2.0 (21). High periodicity and tight regulation 
are indicated by small values of P(per) and P(reg), respectively. 
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early replication using homologous chromosomes in the 
nucleus. 

LOH events overlap with sites of high RNA polll 
occupancy 

We next investigated whether certain genomic features, 
which are commonly associated with replicative stress, 
also significantly overlap with LOH events (Figure 3A). 
Common fragile sites are frequent sources of genomic in- 
stability (23). We first analysed whether the LOH events 
significantly overlap with the 76 well-characterized 
common fragile sites (23) and the newly reported early 
replicating fragile sites (ERFS) (24), which are implicated 
in somatic copy number alterations and translocations in 
lymphoma subtypes, respectively. Interestingly, we did not 
observe any significant overlap between short (<10kb; the 
third metric) LOH events and both classes of fragile sites 
(CFS or ERFS; permutation test; P- value >5 x 10~ 2 ; 
Figure 3B) described above. The results were similar 
using the other two metrics as well. 

Even though transcription and replication are meant to 
be spatially segregated, collision of replication fork with 



paused RNA polll is another key cause of replicative 
stress (32). Combining transcription start sites and RNA 
polll occupancy data from multiple ENCODE cell lines, 
we found that the sites of high RNA polll occupancy 
significantly overlapped with transcription start sites (per- 
mutation test; /"-value <1 x 10~ 4 ), which is consistent 
with the reports that promoter-proximal RNA polll 
pausing is common (33). Integrating LOH data from 
GBM and ovarian cancer samples, and using the third 
metric (LOH of size <10kb), we found that both the tran- 
scription start sites of genes and the sites of RNA polll 
occupancy significantly overlapped with the short 
(<10kb) LOH events (permutation test; P-value 
< 1 x 10" 3 ; Fi gure 3B). We did not have spatial resolution 
in the data sets to test whether RNA polll paused at pre- 
initiation complex or in early elongation contributed to 
this pattern. Nevertheless, a vast majority of these genes 
were expressed in the tumor samples and also in matched 
normal controls (12). Although the sites of RNA polll 
occupancy, present in two or more ENCODE cell lines, 
accounted for <1.5% of the early replicating regions, they 
overlapped with more than >10% of the short (<10kb) 
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Barlow JHetal. Cell. 2013 Jan 
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Hansen RS et al. Proc Natl Acad Sci U 
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Hansen RS et al. Proc Natl Acad Sci U 
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2013 Jan;41(Database issue):D48-55. 
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Transcription start sites defined by Ensembl v54 (human 
reference genome hgl8) 


Flicek P et al. Nucleic Acids Res. 
2013 Jan;41(Database issue):D48-55. 


RNA polll bound 
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RNA pol II peaks analysis across multiple ENCODE cell lines 
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Figure 3. (A) Summary of different genomic features analysed in the context of LOH. (B) Comparisons of the observed (gray vertical line) extent of 
overlap between these features with short (<10kb) LOH events in GBM and ovarian cancer samples, with that expected (light gray bars) when the 
LOHs are shuffled across the genome. 
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LOH events in ovarian cancer samples. We found similar 
evidence for preferential overlap between the sites of RNA 
polll occupancy and LOH events using the other two 
metrics as well (Supplementary Module SM5), and even 
after adjusting for potential covariates such as GC 
content, gene density and size of the LOH events (permu- 
tation test; P-value <5 x 10~ 2 ). Even though we could not 
possibly test every possible covariate or analyse S-phase 
RNA polll occupancy data from the same samples, con- 
sistency of our findings across multiple data sets and ana- 
lytical approaches hint that these issues are perhaps 
unlikely to bias our conclusions. Interpreting our 
findings in the context of recent reports (24,33-36), it is 
tempting to suggest that during early S phase, replicative 
stress in the vicinity of RNA polll (37), which are trapped 
as per-initiation complex or paused in early elongation 
(38), can invoke HR-mediated repair. 

LOH frequency inversely correlates with inter-homologous 
chromosome distance 

Before and during early replication (Gl-S phase), homolo- 
gous chromosomes frequently contribute templates for 
HR-mediated rescue of replication (31). Eukaryotic 
chromosomes occupy distinct nuclear territories, such 
that some pairs of homologous chromosomes (e.g. chrl9) 
are closer to each other than other pairs (e.g. chr4; 



Figure 4A). Despite cell type-specific variation in nuclear 
organization, some chromosome pairs have shorter 
distance than others in the nucleus across different cell 
types (28,29). We investigated whether the relative fre- 
quency of LOH events differed between human chromo- 
somes and whether relative proximity of homologous 
chromosomes correlated with this pattern. Indeed, overlay- 
ing inter-chromosomal distance data (28), we found that 
the relative frequency of LOH events per chromosome 
had significantly (Pearson correlation test; P-value 
<5 x 10 -2 ) inverse correlation with inter-homolog 
distance in the nucleus (Figure 4B-C) for both GBM and 
ovarian cancer. Our findings were generally robust toward 
variation in inter-homologous chromosome distance 
(Supplementary Module SM6). We also obtained similar 
results using 3D fluorescence in situ hybridization-based 
inter-chromosome distance data for human fibroblast (29) 
(Supplementary Module SM6). It is likely that during early 
replication, when sister chromatids are forming, proximity 
of homologous chromosome copies is a key factor affecting 
HR-mediated repair leading to gene conversion and LOH. 



DISCUSSION 

Taken together, we have demonstrated that (i) LOH 
events preferentially occur in early RT regions, which is 
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Figure 4. (A) The relative frequency of LOH events (per sample, per bp) in different chromosomes for the TCGA ovarian cancer and GBM samples. 
(B) Different chromosomes have different inter-homolog distance inside the nucleus. Scatterplot showing distribution of inter-homolog distance of 
human chromosomes against the relative frequency of LOH events per chromosome in the (C) ovarian cancer and (D) GBM samples adjusted by 
chromosome lengths. 
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consistent with the temporal patterns of expression of HR 
pathway genes, (ii) RT preference for LOH events con- 
trasts that for point mutations and somatic copy number 
alterations, (iii) LOH events significantly overlap with 
sites of high RNA polll occupancy near transcription 
start sites and (iv) the relative frequency of LOH events 
in human chromosomes correlates with the distance 
between homologous chromosomes in the nucleus. The 
preference for early RT was observed irrespective of the 
size of the LOH events, and mutation status of BRCA1 
and BRCA2. 

RNA polll pausing at pre-initiation complex or in early 
elongation is widespread in metazoans including humans 
(33,38). Paused RNA polll is known to interfere with 
advancing replisome contributing to replication stress 
(37), R-loop formation (39) and induce HR-mediated 
rescue (35). Although such repair is predominantly done 
using the sister chromatid as a template, the homologous 
chromosome copy may also be used, although at a lower 
frequency (31). The relative location of the homologous 
sequence, derived from sister chromatid or homologous 
chromosome, is suspected to influence the choice of 
template and the efficiency of HR-mediated repair (31). 
In light of these observations, our findings are consistent 
with a model that during early replication when sister 
chromatids are forming, HR-mediated rescue of replica- 
tion forks near paused RNA polll using homologous 
chromosomes leads to LOH events in cancer genomes. 

We also note potential caveats of the analysis. First, we 
prefer to take a conservative stance while inferring causal- 
ity from correlation. Because data on chromatin, long- 
range interactions and temporal expression of the HR 
and NHEJ pathway genes were not derived from the 
same samples, we cautiously interpreted the findings. 
Second, we acknowledge that RNA polll pausing could 
be one of the factors that contribute to replicative stress 
leading to HR-mediated rescue (35,36), and many of these 
factors can be inter-related; thus, a more comprehensive 
survey is required to estimate their effects during early rep- 
lication. Third, we were unable to consider intra-tumor 
heterogeneity, tissue-specific variation in RT and post- 
transcriptional modifications on the DNA repair pathway 
genes during cell cycle in our analysis. Nevertheless, our 
results are consistent across different tumor types, robust 
against the choice of data sets, size classes of LOH events, 
statistical approaches and potential covariates. Moreover, 
they are in agreement with current literature regarding the 
sources of replicative stress and HR-mediated repair. So, 
we anticipate that these issues are unlikely to bias our con- 
clusions. Nevertheless, independent validation of our 
findings would establish the conclusions firmly. 

Our findings highlight an important distinction between 
LOH and other classes of genomic alterations such as 
point mutations and somatic copy number alterations. 
Point mutations (4-6,40,41) and somatic copy number al- 
terations (particularly deletions) (4,7,8) frequently occur 
in late RT regions. In contrast, we found that LOH 
events preferentially occur in early RT regions. In the 
early RT regions, which are also enriched in protein- 
coding genes (3), LOH-mediated gene conversion can po- 
tentially replace wild-type alleles with recessive deleterious 



alleles, leading to increased risk of manifestation of reces- 
sive deleterious traits, complicating the resulting pheno- 
type in the affected individuals. Damage-induced 
hypermutability and error-prone repair of such regions 
could lead to further genetic changes (42,43). 
Furthermore, the difference in RT preference between dif- 
ferent classes of genomic alterations also provokes a 
testable hypothesis whether replicating cells show any 
changing preference between various DNA repair 
pathways, which have different levels of efficiency and 
fidelity (1), as the replication progresses. 

SUPPLMENTARY DATA 

Supplementary Data are available at NAR Online. 
Supplementary Modules M1-M6. 
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